date meantemp
1 1/1/2013 10.000000
2 1/2/2013 7.400000
3 1/3/2013 7.166667
4 1/4/2013 8.666667
5 1/5/2013 6.000000
6 1/6/2013 7.000000
2023-08-03
Long Short-Term Memory(LSTM) is a gated Recurrent Neural Network (RNN) designed to address issues related to long-term dependencies and the problems of gradient vanishing or exploding present in traditional RNNs by (Hochreiter and Schmidhuber 1997).
By incorporating forget gates, input gates, and output gates, LSTM can selectively retain crucial information from time series data based on its features while disregarding irrelevant information.
The central idea is a memory cell which can maintain its state over time, and non-linear gating units which regulate the information flow into and out of the cell (Greff et al. 2016).
LSTM has been utilized in tasks like text categorization, sentence generation, and machine translation (Zhu et al. 2019).
Early on, linear models like Auto-Regressive (AR), Moving Average (MA), and Auto-Regressive Moving Average (ARMA) were proposed for time series forecasting [Wadhvani et al. (2017)](Mo and Tao 2016)(Ge and Kerrigan 2016).
The Auto-Regressive Integrated Moving Average (ARIMA) model was introduced (Ho and Xie 1998), which combines differencing operations to handle non-stationary data.
Figure 1 Architecture of a typical vanilla LSTM block (Van Houdt, Mosquera, and Nápoles 2020).
The LSTM architecture consists of a set of recurrently connected sub-networks, known as memory blocks.
The idea behind the memory block is to maintain its state over time and regulate the information flow through non-linear gating units.
The output of the block is recurrently connected back to the block input and all of the gates.
Lets assume a network comprising N processing blocks and M inputs. The forward pass is this recurrent neural system is described in 6 parts.
Block input. This step involves updating the block input component which combines the current input \(x^{(t)}\) and the output of that LSTM unit \(y^{(t-1)}\) in the last iteration. This can be done as shown below: \[z^{(t)} = g(W_zx^{(t)} + R_zy^{(t-1)} + b_z) - (1) \]
Input gate. During this step, we update the input gate that combines the current input \(x^{(t)}\), the output of that LSTM unit \(y^{(t-1)}\) and the cell value \(c^{(t-1)}\) in the last iteration. This can be done as shown below: \[i_{(t)} =\sigma(W_ix^{(t)} + R_iy^{(t-1)} + p_i.c^{(t-1)} + b_i ) -(2) \]
Forget gate. During this step, the LSTM unit determines which information should be removed from its previous cell states \(c^{(t-1)}\). Therefore, the activation values \(f^{(t)}\) of the forget gates at the time step t are calculated based on the current input \(x^{(t)}\), the outputs \(y^{(t-1)}\) and the state \(c^{(t-1)}\) of the memory cells at the previous time step (t-1), the peephole connections, and the bias terms \(b_f\) of the forget gates. This can be done as shown below: \[ f_{(t)} = \sigma(W_fx^{(t)} + R_fy^{(t-1)} + p_f.c^{(t-1)} + b_f ) -(3) \]
Cell. This step computes the cell values, which combines the block input \(z^{(t)}\), the input gate \(i^{(t)}\) and the forget gate \(f_{(t)}\) with the previous cell value. This can be done as shown below:
\[ c^{(t)} = z^{(t)}. i^{(t)} + c^{(t-1)}.f^{(t)}-(4) \]
Output gate. This step calculates the output gate, which combines the current input \(x^{(t)}\), the output of that LSTM unit \(y^{(t-1)}\) and the cell value \(c^{(t-1)}\) in the last iteration. This can be done as shown below:
\[ o^{(t)} = \sigma(W_ox^{(t)} + R_oy^{(t-1)} + p_o.c^{(t-1)} + b_o ) -(5) \]
Block output. Finally, we calculate the block output, which combines the current cell value \(c^{(t)}\) with the current output gate value as follows:
\[ y^{(t)} = g(c^{(t)}). o^{(t)}-(6) \] In the above steps, \(\sigma\), g and h denote point-wise non-linear activation functions.
The weather forecasting dataset for Indian climate in the city of Delhi, India, covers a period from 1st January 2013 to 24th April 2017.
The dataset includes four key parameters, each providing insights into the weather conditions during this time frame. These are mean temp, humidity, wind speed, mean pressure.
We have considered only ‘Mean temperature (meantemp)’ for this analysis. So we have removed all other attributes from the database.
First six observations of the dataset are
date meantemp
1 1/1/2013 10.000000
2 1/2/2013 7.400000
3 1/3/2013 7.166667
4 1/4/2013 8.666667
5 1/5/2013 6.000000
6 1/6/2013 7.000000
About the Dataset: The dataset was collected from the Weather Underground API and prepared as a part of Assignment 4 of the Data Analytics Course in 2019 at PES University, Bangalore. It is important to note that the ownership and credit for this dataset belong to Weather Underground due to its data source.
The time series plot of the dataset is given below with date on x-axis and meantemp on y-axis respectively.
We have used Min-Max transformation for data preparation. Here, we have used one LSTM layer as a simple LSTM model and a Dense layer is used as the output layer. Then, compile the model using the loss function, optimizer and metrics. This package is based on Keras and TensorFlow modules.(Paul and Garai 2021)
We performed min-max transformation on our Mean Temperature to keep it in the range from (-1 to 1) as we are going to use tanh Gate for our model.
Usage: ts.lstm(ts=df$transformed, tsLag=5, LSTMUnits=7, DropoutRate = 0.1, Epochs = 10, CompLoss = “mse”, CompMetrics = “mae”, ActivationFn = “tanh”, SplitRatio = 0.99, ValidationSplit = 0.2)
Model: "sequential"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
lstm (LSTM) (None, 1, 7) 1624
dense (Dense) (None, 1, 1) 8
================================================================================
Total params: 1,632
Trainable params: 1,632
Non-trainable params: 0
________________________________________________________________________________
Model: "sequential_1"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
lstm_1 (LSTM) (None, 1, 7) 1344
dense_1 (Dense) (None, 1, 1) 8
================================================================================
Total params: 1,352
Trainable params: 1,352
Non-trainable params: 0
________________________________________________________________________________
Model: "sequential_2"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
lstm_2 (LSTM) (None, 1, 7) 1064
dense_2 (Dense) (None, 1, 1) 8
================================================================================
Total params: 1,072
Trainable params: 1,072
Non-trainable params: 0
________________________________________________________________________________
Model: "sequential_3"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
lstm_3 (LSTM) (None, 1, 7) 784
dense_3 (Dense) (None, 1, 1) 8
================================================================================
Total params: 792
Trainable params: 792
Non-trainable params: 0
________________________________________________________________________________
Model: "sequential_4"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
lstm_4 (LSTM) (None, 1, 7) 504
dense_4 (Dense) (None, 1, 1) 8
================================================================================
Total params: 512
Trainable params: 512
Non-trainable params: 0
________________________________________________________________________________
Model: "sequential_5"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
lstm_5 (LSTM) (None, 1, 7) 364
dense_5 (Dense) (None, 1, 1) 8
================================================================================
Total params: 372
Trainable params: 372
Non-trainable params: 0
________________________________________________________________________________
RMSE MAPE
Train 0.0787 0.1218
Test 0.1014 0.1255
As we can see from the graphs, for Lag 50,40,30,20,10 and 5 we are getting the best prediction for Lag=5. Our prediction is matching the trend with actual data. So we are finalizing Lag=5 for our model.
Our Time series prediction using LSTM has demonstrated promising results in forecasting accuracy. During the training phase, the root mean square error (RMSE) of 0.0800 and the mean absolute percentage error (MAPE) of 0.1258 indicate that the LSTM model was able to effectively capture the underlying patterns and trends in the training data.
Upon evaluating the model’s performance on the test dataset, we obtained an RMSE of 0.1039 and a MAPE of 0.1274.
These results signify that the LSTM model successfully generalized to unseen data, showcasing its ability to make accurate predictions beyond the training data.
Overall, the relatively low values of both RMSE and MAPE for both the training and test phases highlight the effectiveness of the LSTM model in handling time series data.
• This dataset ownership and collection credit goes to Weather Undergroud API.
• Special thanks to Dr. Achraf Cohen for all his guidance throughout the project.